Frequency-domain Linear Prediction for Temporal Features

نویسندگان

  • Marios Athineos
  • Daniel P.W. Ellis
چکیده

Current speech recognition systems uniformly employ short-time spectral analysis, usually over windows of 1030 ms, as the basis for their acoustic representations. Any detail below this timescale is lost, and even temporal structure above this level is usually only weakly represented in the form of deltas etc. We address this limitation by proposing a novel representation of the temporal envelope in different frequency bands by exploring the dual of conventional linear prediction (LPC) when applied in the transform domain. With this technique of frequency-domain linear prediction (FDLP), the ‘poles’ of the model describe temporal, rather than spectral, peaks. By using analysis windows on the order of hundreds of milliseconds, the procedure automatically decides how to distribute poles to best model the temporal structure within the window. While this approach offers many possibilities for novel speech features, we experiment with one particular form, an index describing the ‘sharpness’ of individual poles within a window, and show a large relative word error rate improvement from 4.97% to 3.81% in a recognizer trained on general conversational telephone speech and tested on a small-vocabulary spontaneous numbers task. We analyze this improvement in terms of the confusion matrices and suggest how the newlymodeled fine temporal structure may be helping.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal resolution analysis in frequency domain linear prediction.

Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are studied which affect the temporal resolution and this analysis suggests ways to improve the resolution of the FDLP envelo...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

A Novel Temporal-Frequency Domain Error Concealment Method for Motion Jpeg

Motion-JPEG is a common video format for compression of motion images with highquality using JPEG standard for each frame of the video. During transmission through a noisychannel some blocks of data are lost or corrupted, and the quality of decompression frames decreased.In this paper, for reconstruction of these blocks, several temporal-domain, spatial-domain, andfrequency-domain error conceal...

متن کامل

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions

In poor room acoustics conditions, speech signals received by a microphone might become corrupted by the signals’ delayed versions that are reflected from the room surfaces (e.g. wall, floor). This phenomenon, reverberation, drops the accuracy of automatic speaker verification systems by causing mismatch between the training and testing. Since reverberation causes temporal smearing to the signa...

متن کامل

PLP 2 Autoregressive modeling of auditory - like 2 - D spectro - temporal patterns

The temporal trajectories of the spectral energy in auditory critical bands over 250 ms segments are approximated by an all-pole model, the time-domain dual of conventional linear prediction. This quarter-second auditory spectro-temporal pattern is further smoothed by iterative alternation of spectral and temporal all-pole modeling. Just as Perceptual Linear Prediction (PLP) uses an autoregress...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003